Development and experimental validation of an energy metabolism-related gene signature for diagnosing of osteoporosis

Osteoporosis is usually caused by excessive bone resorption and energy metabolism plays a critical role in the development of osteoporosis. However, little is known about the role of energy metabolism-related genes in osteoporosis. This study aimed to explore the important energy metabolism-related genes involved in the development of osteoporosis and develop a diagnosis signature for osteoporosis. The GSE56814, GSE62402, and GSE7158 datasets were downloaded from the NCBI Gene Expression Omnibus. The intersection of differentially expressed genes between high and low levels of body mineral density (BMD) and genes related to energy metabolism were screened as differentially expressed energy metabolism genes (DE-EMGs). Subsequently, a DE-EMG-based diagnostic model was constructed and differential expression of genes in the model was validated by RT-qPCR. Furthermore, a receiver operating characteristic curve and nomogram model were constructed to evaluate the predictive ability of the diagnostic model. Finally, the immune cell types in the merged samples and networks associated with the selected optimal DE-EMGs were constructed. A total of 72 overlapped genes were selected as DE-EMGs, and a five DE-EMG based diagnostic model consisting B4GALT4, ADH4, ACAD11, B4GALT2, and PPP1R3C was established. The areas under the curve of the five genes in the merged training dataset and B4GALT2 in the validation dataset were 0.784 and 0.790, respectively. Moreover, good prognostic prediction ability was observed using the nomogram model (C index = 0.9201; P = 5.507e−14). Significant differences were observed in five immune cell types between the high- and low-BMD groups. These included central memory, effector memory, and activated CD8 T cells, as well as regulatory T cells and activated B cells. A network related to DE-EMGs was constructed, including hsa-miR-23b-3p, DANCR, 17 small-molecule drugs, and two Kyoto Encyclopedia of Genes and Genomes pathways, including metabolic pathways and pyruvate metabolism. Our findings highlighted the important roles of DE-EMGs in the development of osteoporosis. Furthermore, the DANCR/hsa-miR-23b-3p/B4GALT4 axis might provide novel molecular insights into the process of osteoporosis development.


Data source
Osteoporosis-related datasets, including GSE56814, GSE62402, and GSE7158 [14][15][16] were downloaded from the NCBI Gene Expression Omnibus (GEO; https:// www.ncbi.nlm.nih.gov/) 17 .The GSE56814 dataset includes peripheral blood monocytes from 42 and 31 patients with high and low BMD, respectively.GSE62402 included peripheral blood monocytes from five patients with high BMD and five patients with low BMD.These two datasets were sequenced using the [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array.GSE7158 included peripheral blood monocytes from 14 and 12 patients with high and low BMD, respectively.The data were based on the [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array.

Selection of differentially expressed EM genes
The batch effects of the three datasets were first removed using the sva package (version 3.38.0,http:// www.bioco nduct or.org/ packa ges/ relea se/ bioc/ html/ sva.html) 18 in R3.6.1.The combined expression levels were then calculated.
The intersection of DEGs and EM genes was selected as differentially expressed energy metabolism genes (DE-EMGs).The methylation levels of the DE-EMGs were also determined.Finally, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis 20,21 were performed using the DAVID software (version 6.8, https:// david.ncifc rf.gov/) 22,23 .GO enrichment analysis was used to investigate the functions of the DE-EMGs, including biological processes (BP), cellular components (CC), and molecular functions (MF).P < 0.05 was defined as the threshold.

Screening of disease-related genes based on WGCNA algorithm
Weighed gene co-expression network analysis (WGCNA) is a bioinformatic algorithm for constructing co-expression networks.The network could identify modules associated with diseases and screen important pathogenic mechanisms or potential therapeutic targets.Modules related to disease status were screened based on genes detected in the merged dataset using the WGCNA package (version 1.61, https:// cran.r-project.org/ web/ packa ges/ WGCNA/ index.html) 24 in R3.6.1.WGCNA was performed by defining adjacency functions and module partitioning.The threshold for module partitioning screening was set as follows: the module set contained at least 150 genes and cutHeight = 0.995.
The selected DE-EMGs were mapped onto each WGCNA module.A hypergeometric algorithm was used to calculate the fold enrichment parameter and enrichment significance p-value of the differential genes in the module using the following formula: 25 .N represents all genes involved in the WGCNA of the algorithms, M represents the number of genes in each module obtained by the WGCNA algorithm, n represents the number of significantly differentially expressed genes filtered in Step 2, and k represents the number of significantly differentially expressed genes mapped to the corresponding module.
A diagnostic model based on the optimal DE-EMGs was constructed using a support vector machine in R3.6.1 e1071 (Version 1.6-8, https:// cran.r-project.org/ web/ packa ges/ e1071) on the merged training set.The receiver operating characteristic (ROC) curve of the merged training dataset and independent validation dataset (GSE13850) was constructed using R 3.6.1,pROC (Version 1.12.1, https:// cran.r-project.org/ web/ packa ges/ pROC/ index.html) 27 to verify the efficiency of the diagnostic model.
The alignment diagram, also known as the nomogram, integrates multiple predictive indicators using multivariate regression analysis.Here, these indicators were visualized on the same plane at a certain scale using scaled line segments to evaluate the interrelationships among the various variables in the predictive model 28 .The nomogram model and correct line chart were constructed using the RMS package (version 5.1-2; https:// cran.r-project.org/ web/ packa ges/ rms/ index.html) in R3.6.1.

Evaluation of immune features based on the ssGSEA algorithm
The microenvironment comprises fibroblasts, immune cells, the extracellular matrix, growth factors, inflammatory factors, and special physicochemical characteristics.Microenvironments significantly affect the diagnosis, survival outcomes, and clinical treatment sensitivity of diseases.Cells in the microenvironment can aggregate into different categories.In addition, there are complex and significant interactions between each cell  The proportion of immune cells in the low-and high-BMD samples was compared using the Kruskal-Wallis test.
Finally, the correlation between the expression levels of the optimized DE-EMGs used to construct the diagnostic models and important immune cells was calculated using the COR function in R3.6.1.

Network construction based on DE-EMGs
Construction of TF regulatory relationships Related TFs targeting DE-EMGs were explored using transcriptional regulatory relationships revealed by sentence-based text mining (TRRUST, https:// www.grnpe dia.org/ trrus t/) 30 .All the related TFs and their target genes were downloaded from the database.The DE-EMGs selected for diagnostic model construction and related TFs were screened for further studies.

Construction of chemical molecules and KEGG related to osteoporosis
Chemical drug molecules and KEGG pathways were explored using the Comparative Toxicogenomics Database 2023 update (http:// ctd.mdibl.org/) 34 .Thereafter, chemical drug molecules targeted by the DE-EMGs were selected for the diagnostic model.Finally, an integrated network of DE-EMGs was constructed for diagnostic models.

Reverse-transcription quantitative PCR (RT-qPCR)
Fresh peripheral blood was collected from eight low-BMD people (6 females, 2 males) and eight high-BMD people (4 females, 4 males).Total RNA from monocytes was extracted by Trizol reagent (GENSTAR Inc.Beijing, China) and was reverse transcribed to cDNA with Primer ScriptTM RT Reagent (Thermo Fisher Scientific, Waltham, MA, USA).Then, real-time PCR was performed on a RT-qPCR machine (Bio-Rad, Hercules, CA, United States) with a SYBR green detection system (TargetMol Chemicals Inc. Shanghai, China).GAPDH was used as internal control.The primers used are listed in Table 1.Written and informed consent was obtained from all participants.The study was approved by the Ethics Committee of Honghui Hospital, Xi'an Jiaotong Universty.

Statistical analysis
The bioinformatics analysis was performed by corresponding packages in R3.6.1.P < 0.05 or FDR < 0.05 was regarded as threshold for statistical significance level when applicable.For the experimental validation, data were analyzed by student's t test in GraphPad Prism 9.0 (Boston, MA, USA) with significance level of P < 0.05.

Ethics approval and consent to participate
Written and informed consent was obtained from all participants.The study was approved by the Ethics Committee of Honghui Hospital, Xi'an Jiaotong Universty.

Selection of DE-EMGs
Batch effects of the three datasets were removed.Next, the datasets were merged into one.The participants were then divided into high-and low-BMD groups, which included 61 and 48 samples, respectively.A total of 427 DEGs with high vs. low BMD were obtained.A volcano plot of DEGs is shown in Fig. 2A.A heatmap of DEGs showed different expression levels of DEGs in participants with high BMD compared with those with low BMD (Fig. 2B).
A comparison of the EM-related genes with DEGs revealed 72 overlapping genes that were identified as DE-EMGs.Furthermore, GO and KEGG pathway enrichment analysis were performed.Briefly, the identified DE-EMGs were significantly enriched in 43 GO terms (21 BPs, 10 CCs, 12 MFs), such as carbohydrate metabolic, xenobiotic metabolic, and ethanol catabolic processes (Fig. 3A).In contrast, 22 KEGG pathways, including metabolic pathways, xenobiotic metabolism by cytochrome P450, and drug metabolism by cytochrome P450, were enriched in these DE-EMGs (Fig. 3B).

Screening of osteoporosis-related genes based on the WGCNA algorithm
The value of the adjacency matrix weight parameter power was explored to satisfy the precondition of a scalefree network distribution.The range of the network construction parameters was first selected, and then the scale-free distribution topology matrix was calculated.
As shown in Fig. 4A, the value of power was selected when the square value of the correlation coefficient reached 0.9 for the first time (power = 18).The average node connectivity of the constructed co-expression network was 1, which fully conformed to the properties of small-world networks.The dissimilarity coefficient between gene points was then calculated, and a system clustering tree was obtained.The minimum number of genes for each module was set to 150, and the cut height was 0.995.Nine modules were obtained (Fig. 4B).The correlation between the BMD status of the samples and each module is shown in Fig. 4C.MEblue, MEbrown, and MEblack positively correlated with low BMD and negatively correlated with high BMD.The other five modules, namely MEturquoise, MEyellow, MEpink, MEred, and MEgrey, were positively correlated with high BMD and negatively correlated with low BMD.
The expression levels of the 12 DE-EMGs in high-and low-BMD groups are shown in Fig. 5A.Here, the expression levels of DE-EMGs in the high-BMD group were all significantly higher than those in the low-BMD group (P < 0.05).The relationships among these 12 DE-EMGs are shown in Fig. 5B.6C, the expression levels of B4GALT4, ADH4, ACAD11 and PPP1R3C were significantly increased (P < 0.01), while B4GALT2 was significantly decreased in high-BMD group (P < 0.05).The experimental validation results showed a relatively high consistency rate with bioinformatics study.A nomogram model (Fig. 7A) and corrected curves (Fig. 7B) were constructed to evaluate the efficiency of the diagnostic model based on the data from the merged training dataset.B4GALT4 made the greatest contribution to survival.Good predictive ability was observed in the model (C-index = 0.9201; P = 5.507 e−14 ).Furthermore, the areas under the curve (AUC) of the five genes was 0.784, which had the highest predictive value, followed by ADH4 (AUC = 0.768), PPP1R3C (AUC = 0.768), B4GALT4 (AUC = 0.762), ACAD11 (AUC = 0.757), and B4GALT2 (AUC = 0.727; Fig. 7C).
A nomogram model (Fig. 8A) and corrected curves (Fig. 8B) were constructed to evaluate the efficiency of the diagnostic model based on the validation dataset.B4GALT2 made the greatest contribution to survival.The predictive ability of the model showed similar performance to the ideal model (C-index = 0.7841, P = 0.0003513).ROC analysis revealed that the AUC of B4GALT2 was 0.790.This had the highest predictive value, followed by B4GALT4 (AUC = 0.770), 5 gene (AUC = 0.760), ADH4 (AUC = 0.730), ACAD11 (AUC = 0.720), and PPP1R3C (AUC = 0.630; Fig. 8C).The specificity, sensitivity and Youden index is displayed in Table 3.

Immune characteristics based on ssGSEA
The ratios of 28 immune cells were determined using ssGSEA.A heat map of the 28 immune cell distributions between the high-and low-BMD groups is shown in Fig. 9A.Significant differences in five immune cell types were observed between the high-and low-BMD groups, including central memory, effector memory, and activated CD8 T cells, as well as regulatory T cells and activated B cells.The relationship between the five selected DE-EMGs and immune cells was further analyzed (Fig. 9B).Activated B cells and CD8 T cells were positively correlated with the five selected DE-EMGs.Central memory CD8 T cells, regulatory T cells, and effector memory CD8 + T cells were negatively correlated with the five selected DE-EMGs.
Eight KEGG pathways related to the DE-EMGs were identified, including tyrosine metabolism, fatty acid degradation, and pyruvate metabolism pathways (Table 4).After comparing the 168 KEGG pathways involved in the CTD, two KEGG pathways were identified: hsa01100: metabolic pathways and hsa00620: pyruvate metabolism.The integrated network related to the five DE-EMGs and osteoporosis is shown in Fig. 10.

Discussion
Energy metabolism has attracted increasing attention in investigations of the pathophysiology of chronic diseases, including osteoporosis.In this study, 72 genes were selected as DE-EMGs, and a diagnostic model of five DE-EMGs were constructed: B4GALT4, ADH4, ACAD11, B4GALT2, and PPP1R3C.ROC and nomogram models confirmed good predictive ability based on the genes.Furthermore, five immune cell types showed significant differences between the high-and low-BMD groups: central memory CD8 + T cells, regulatory T cells, effector memory CD8 + T cells, activated B cells, and activated CD8 + T cells.Networks based on DE-EMGs showed that hsa-miR-23b-3p, DANCR, 17 small-molecule drugs, and two KEGG pathways, including metabolic pathways and pyruvate metabolism, might be involved in osteoporosis development.B4GALT4, ADH4, ACAD11, B4GALT2, and PPP1R3C may be valuable biomarkers for the development of osteoporosis.ADH4, a critical member of the ADH family, is a well-known prognostic biomarker for hepatocellular carcinoma and is involved in the metabolism of ethanol and retinol.Retinol levels are associated with the occurrence of osteoporosis 35,36 .Osteoporosis might be caused by poor nutrition.The upregulation of PPP1R3C expression significantly increases glycogen synthesis and storage 37 .B4GALT family genes are associated with multiple biological processes such as cell apoptosis and proliferation 38 .Moreover, ACAD11 is required when cells undergo glucose starvation and is involved in fatty acid oxidation 39 .Thus, we propose that B4GALT4, ADH4, ACAD11, B4GALT2, and PPP1R3C are involved in the development of osteoporosis, potentially by modulating metabolic pathways.
In this study, DANCR was identified as an important lncRNA in the network based on the five selected DE-EMGs.MiR-23b-3p and DANCR act as competing endogenous RNA targeting B4GALT4, whose roles have been previously investigated.Yasuoka et al. 40 showed that this gene was mainly involved in the regulation of notochord morphogenesis.Previous studies identified DANCR as an essential mediator of osteoblast differentiation.Further evidence demonstrates that blood mononuclear cells with upregulated DANCR expression could lead to osteoporosis, thereby increasing the secretion of IL-6 and TNF-α as well as bone resorbing activity 41 .MiR-23b-3p has been verified as a potential biomarker for osteoporosis because it participates in bone mineral density variation 42 .Moreover, the Wnt/β-catenin signaling pathway has been confirmed as a DANCR and miR-23b-3p targeted pathway in osteoporosis 43,44 .Our data predicted that metabolic pathways were enriched in the DANCR/hsa-miR-23b-3p/B4GALT4 axis.Osteoporosis is a metabolic bone disease, and its occurrence is related to various metabolic, genetic, and nutritional factors.Previous evidence has shown that a lower BMD is usually found in patients with metabolic syndrome than in those without metabolic syndrome 45 .Therefore, we hypothesized that the DANCR/hsa-miR-23b-3p/B4GALT4 axis may provide novel molecular insights into the development of osteoporosis.
The imbalance between bone formation and resorption is one of the causes of bone loss.Previous evidence has shown that bone resorption and formation can be modified by complex interactions among dendritic cells, B lymphocytes, and T lymphocytes 46 .Furthermore, several studies have confirmed the importance of T lymphocytes in the regulation of bone resorption, which might be mediated by various cytokines, such as IFNγ, TNF-α, and IL-6 47,48 .Patients with osteoporosis have a higher CD4 + /CD8 + ratio compared to the control group 49 .Our data showed that five immune cell types, central memory CD8 T, regulatory T, effector memory CD8 T, activated B, and activated CD8 + T cells, differed significantly between the high-and low-BMD groups.These data suggest that the ratio of CD8 T cells may be used to assess osteoporosis outcomes.
Our study performed a comprehensive analysis of important EMGs in osteoporosis and developed a diagnosis signature consisting 5 EMGs, including B4GALT4, ADH4, ACAD11, B4GALT2, and PPP1R3C with a relative higher AUC .The differential expression of genes in the model was validated by RT-qPCR.To the best of our knowledge, this is the first study to develop a diagnosis signature based on EMGs.However, there are some limitations in this study.First, the differential expression of the five EMGs were screened from three datasets with individual heterogeneity of background and only validated in a small sample size.Second, the diagnosis signature should still be validated in large sample-size clinical studies in future.Third, the ceRNA network established in this study is warranted for in vivo or in vitro experiments.Overall, this study constructed a model that can predict the incidence of osteoporosis and identified B4GALT4, ADH4, ACAD11, B4GALT2, and PPP1R3C as potential biomarkers for osteoporosis development.The ratio of CD8 T cells might enable the assessment of osteoporosis outcomes.Furthermore, the DANCR/hsa-miR-23b-3p/ B4GALT4 axis may provide novel molecular insights into osteoporosis development.However, further clinical studies with more participants should be conducted to verify this conclusion.

Figure 1 .
Figure 1.A flow chart of the present study.

Figure 2 .
Figure 2. Differentially expressed genes (DEGs) between high-and low-body mineral density (BMD).(A) Volcano plot of DEG selection.The blue and red dots represent significantly downregulated and upregulated DEGs, respectively.The black horizontal line represents a fold discovery rate (FDR) < 0.05, and the two vertical lines represent a | log2 fold change (FC) |> 0.263; (B) The heatmap of DEGs.Black and white bars represent the high-and low-BMD groups, respectively.

Figure 3 .
Figure 3. Functional enrichment analysis of differentially expressed energy metabolism genes (DE-EMGs).(A) The top 15 gene ontology items enriched by DE-EMGs; (B) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched by DE-EMGs.

Figure 4 .
Figure 4. Screening of osteoporosis-related genes based on the WGCNA algorithm.(A) An adjacency matrix weight parameter power selection graph and the schematic diagram of average connectivity of genes under different power parameters.The horizontal axis represents the weight parameter power, whereas the vertical axis represents the square of the correlation coefficients between log (k) and log (p (k)) in the corresponding network.The red line represents the standard line where the square value of the correlation coefficient reaches 0.9.The red line in the right graph indicates the average connectivity of network nodes under the weight parameter power of the adjacency matrix in the left figure; (B) A module division tree diagram, with each color representing different modules; (C) A module-trait related heatmap.

Figure 5 .
Figure 5.The expression levels of 12 DE-EMGs between high-and low-BMD.(A) Comparison of 12 DE-EMG expression levels between high and low BMD; (B) A heatmap display of the correlation of expression levels of 12 DE-EMGs.

Figure 6 .
Figure 6.Single-factor logistic regression analysis based on expression levels of 12 DE-EMGs.(A) The results of single-factor logistic regression analysis; (B) A LASSO parameter graph.(C) Experimental validation of five DE-EMGs in the diagnosis model.*, ** indicates P < 0.05 and P < 0.01.

Figure 7 .
Figure 7.A nomogram model and corrected curves based on data from the merged training dataset.(A) Nomogram of five DE-EMGs selected for constructing the diagnosis model; (B) A corrected line chart; (C) A receiver operating characteristic (ROC) curve.

Figure 9 .
Figure 9. Evaluation of sample immune features based on the ssGSEA algorithm.(A) A heatmap of 28 immune cell type distribution in high-and low-BMD groups; (B) The correction between five DE-EMGs selected for constructing a diagnosis model and immune cells.
type and other cells, as well as the infiltration patterns of some robust cells.Immunological signature gene sets were downloaded from the gene set enrichment analysis database (GSEA, http:// softw are.broad insti tute.org/ gsea/ index.jsp) to evaluate the immune infiltration types in the merged samples.Next, the immune infiltration types of the merged samples were evaluated using the GSVA package (Version 1.36.3) 29in R3.6.1.This was determined based on single-sample GSEA (ssGSEA, http:// www.bioco nduct or.org/ packa ges/ relea se/ bioc/ html/ GSVA.html).

Table 1 .
The primers used for RT-qPCR.

Table 2 .
The information on modules.

Table 4 .
Kyoto Encylopaedia of Genes and Genomes pathways enriched by five differentially expressed energy metabolism genes (DE-EMGs) selected for constructing diagnosis models.